{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# [HW 4] Kernel Ridge Regression Practice\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "import os\n",
                "import matplotlib.pyplot as plt\n",
                "import numpy as np\n",
                "from matplotlib import cm\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# Make a result directory to store plots\n",
                "os.makedirs(\"./result\", exist_ok=True)\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "def lstsq(A, b, lambda_=0):\n",
                "    return np.linalg.solve(A.T @ A + lambda_ * np.eye(A.shape[1]), A.T @ b)\n",
                "\n",
                "\n",
                "def heatmap(f, clip=True):\n",
                "    # example: heatmap(lambda x, y: x * x + y * y)\n",
                "    xx = yy = np.linspace(np.min(X), np.max(X), 72)\n",
                "    x0, y0 = np.meshgrid(xx, yy)\n",
                "    x0, y0 = x0.ravel(), y0.ravel()\n",
                "    z0 = f(x0, y0)\n",
                "\n",
                "    if clip:\n",
                "        z0[z0 > 5] = 5\n",
                "        z0[z0 < -5] = -5\n",
                "\n",
                "    plt.hexbin(x0, y0, C=z0, gridsize=50, cmap=cm.jet, bins=None)\n",
                "    plt.colorbar()\n",
                "    cs = plt.contour(\n",
                "        xx, yy, z0.reshape(xx.size, yy.size), [-2, -1, -0.5, 0, 0.5, 1, 2], cmap=cm.jet)\n",
                "    plt.clabel(cs, inline=1, fontsize=10)\n",
                "\n",
                "    pos = y[:] == +1.0\n",
                "    neg = y[:] == -1.0\n",
                "    plt.scatter(X[pos, 0], X[pos, 1], c='red', marker='+')\n",
                "    plt.scatter(X[neg, 0], X[neg, 1], c='blue', marker='v')\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "data_names = ['circle', 'heart', 'asymmetric']\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## (a) Visualize the Datasets\n",
                "\n",
                "**Visualize all the datasets.\n",
                "Label the points with different $y$ values with different colors and/or shapes.**\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# data = np.load('circle.npz')\n",
                "# data = np.load('heart.npz')\n",
                "data = np.load('asymmetric.npz')\n",
                "\n",
                "X = data[\"x\"]\n",
                "y = data[\"y\"]\n",
                "\n",
                "for i, dataset in enumerate(data_names):\n",
                "    data = np.load(dataset + '.npz')\n",
                "    X = data[\"x\"]\n",
                "    y = data[\"y\"]\n",
                "\n",
                "    plt.figure(figsize=[6,12])\n",
                "    plt.subplot(3,1,i+1)\n",
                "    ### start viz_data ###\n",
                "\n",
                "    ### end viz_data ###\n",
                "    plt.legend()\n",
                "    plt.title(dataset)\n",
                "plt.show();\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## (b) Polynomial Regression (Non-kernel)\n",
                "\n",
                "**Implement polynomial ridge regression** to\n",
                "fit the datasets `circle.npz`, `asymmetric.npy`, and\n",
                "`heart.npz`.  Use the first 80% of the data as the training dataset and the\n",
                "last 20% of the data as the validation dataset.\n",
                "**Report both the average\n",
                "training squared loss and the average validation squared for polynomial order\n",
                "$p \\in \\{1, \\dots, 16\\}$.  Use the regularization term $\\lambda=0.001$ for all\n",
                "$p$.  Visualize your result and attach the heatmap plots for the\n",
                "learned predictions over the entire 2D domain for $p \\in \\{2, 4, 6, 8, 10,\n",
                "12\\}$ in your writeup.**\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# data = np.load('circle.npz')\n",
                "data = np.load('heart.npz')\n",
                "# data = np.load('asymmetric.npz')\n",
                "\n",
                "SPLIT = 0.8\n",
                "X = data[\"x\"]\n",
                "y = data[\"y\"]\n",
                "X /= np.max(X)  # normalize the data\n",
                "\n",
                "n_train = int(X.shape[0] * SPLIT)\n",
                "X_train = X[:n_train:, :]\n",
                "X_valid = X[n_train:, :]\n",
                "y_train = y[:n_train]\n",
                "y_valid = y[n_train:]\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "def assemble_feature(x, D):\n",
                "    \"\"\"Create a vector of polynomial features up to order D from x\"\"\"\n",
                "    ### start poly_features ###\n",
                "\n",
                "    ### end poly_features ###\n",
                "    return poly_x\n",
                "\n",
                "\n",
                "LAMBDA = 0.001\n",
                "isubplot = 0\n",
                "plt.figure(figsize=[12,10])\n",
                "for D in range(1, 17):\n",
                "    ### start poly_nonkernel ###\n",
                "\n",
                "    ### end poly_nonkernel ###\n",
                "    if D in [2, 4, 6, 8, 10, 12]:\n",
                "        isubplot += 1\n",
                "        plt.subplot(3,2,isubplot)\n",
                "        heatmap(lambda x, y: assemble_feature(np.vstack([x, y]).T, D) @ w)\n",
                "        plt.title(\"D = %d\" % D)\n",
                "    print(\"p = {:2d}   train_error = {:10.6f}  validation_error = {:10.6f}  cond = {:14.6f}\".\n",
                "            format(D, error_train, error_valid, cond))\n",
                "plt.show();\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## (c) Polynomial Kernel Ridge Regression\n",
                "\n",
                "**Implement kernel ridge regression** to fit the datasets\n",
                "`circle.npz`, `heart.npz`, and optionally (due to the\n",
                "computational requirements), `asymmetric.npz`. Use the polynomial\n",
                "kernel $K(\\vec x_i, \\vec x_j) = (1 + \\vec x_i^\\top \\vec x_j)^p$. Use the first\n",
                "80\\% data as the training dataset and the last 20\\% data as the validation\n",
                "dataset.  **Report both the average training squared loss and the average\n",
                "validation squared loss for polynomial order $p \\in \\{1,\\dots, 16\\}$.  Use the\n",
                "regularization term $\\lambda=0.001$ for all $p$. For\n",
                "`circle.npz`, also report the average training squared loss and\n",
                "validation squared loss for polynomial order $p \\in \\{1,\\dots, 24\\}$ when you\n",
                "use only the first 15\\% of data as the training dataset and the final 85\\% of data as\n",
                "the validation dataset.  Based on the error, comment on when you want\n",
                "to use a high-order polynomial in linear/ridge regression.**\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# data = np.load('circle.npz')\n",
                "data = np.load('heart.npz')\n",
                "# data = np.load('asymmetric.npz')\n",
                "\n",
                "SPLIT = 0.8\n",
                "X = data[\"x\"]\n",
                "y = data[\"y\"]\n",
                "X /= np.max(X)  # normalize the data\n",
                "\n",
                "n_train = int(X.shape[0] * SPLIT)\n",
                "X_train = X[:n_train:, :]\n",
                "X_valid = X[n_train:, :]\n",
                "y_train = y[:n_train]\n",
                "y_valid = y[n_train:]\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "def poly_kernel(X, XT, D):\n",
                "    \"\"\"Create the polynomial order D kernel matrix from X and X^T\"\"\"\n",
                "    ### start poly_kernel_helper ###\n",
                "\n",
                "    ### end poly_kernel_helper ###\n",
                "    return K\n",
                "\n",
                "isubplot = 0\n",
                "plt.figure(figsize=[12,10])\n",
                "for D in range(1, 16):\n",
                "    ### start poly_kernel ###\n",
                "\n",
                "    ### end poly_kernel ###\n",
                "plt.show();\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "_Your comments here..._\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## (d) Diminishing Influence of the Prior with Growing Data\n",
                "\n",
                "With increasing of amount of data, the gains from regularization diminish.\n",
                "**Sample a subset of training data from the first 80\\% of data from\n",
                "\\texttt{asymmetric.npz} and use the data from the last 20\\% of data for\n",
                "validation. Make a plot whose $x$ axis is the amount of the training\n",
                "data and $y$ axis is the validation squared loss of the non-kernelized ridge\n",
                "regression algorithm.\n",
                "Include 6 curves for hyper-parameters $\\lambda \\in \\{0.0001,\n",
                "0.001, 0.01\\}$ and $p = \\{5, 6\\}$.**\n",
                "Your plot should demonstrate that with same\n",
                "$p$, the validation squared loss will converge with enough data, regardless of\n",
                "the choice of $\\lambda$.  You can use log scaling on the\n",
                "$x$ axis for clarity, and you need to resample the data multiple times for a\n",
                "given $p$, $\\lambda$, and amount of training data in order to get a smooth curve.\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "data = np.load('asymmetric.npz')\n",
                "\n",
                "SPLIT = 0.8\n",
                "X = data[\"x\"]\n",
                "y = data[\"y\"]\n",
                "X /= np.max(X)  # normalize the data\n",
                "\n",
                "n_train = int(X.shape[0] * SPLIT)\n",
                "X_train = X[:n_train:, :]\n",
                "X_valid = X[n_train:, :]\n",
                "y_train = y[:n_train]\n",
                "y_valid = y[n_train:]\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "### start vanishing_prior ###\n",
                "\n",
                "### end vanishing_prior ###\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## (e) RBF Kernel Ridge Regression\n",
                "\n",
                "A popular kernel function that is widely used in various kernelized\n",
                "learning algorithms is called the radial basis function kernel (RBF kernel).\n",
                "It is defined as\n",
                "\\begin{equation} K(\\mathbf{x}, \\mathbf{x}') = \\exp \\left(-\\frac{\\lVert\n",
                "\\mathbf{x}-\\mathbf{x}'\\rVert_2^2}{2\\sigma^2}\\right).\n",
                "\\end{equation}\n",
                "**Implement the RBF kernel function for kernel ridge regression to fit the dataset\n",
                "`heart.npz`.  Use the regularization term $\\lambda=0.001$.\n",
                "Report the average squared loss, visualize your result and attach the\n",
                "heatmap plots for the fitted functions over the 2D domain for $\\sigma \\in \\{10,\n",
                "3, 1, 0.3, 0.1, 0.03\\}$ in your writeup.**\n",
                "You may want to vectorize your kernel\n",
                "functions to speed up your implementation. **Comment on the effect of\n",
                "$\\sigma$.**\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# data = np.load('circle.npz')\n",
                "data = np.load('heart.npz')\n",
                "# data = np.load('asymmetric.npz')\n",
                "\n",
                "SPLIT = 0.8\n",
                "X = data[\"x\"]\n",
                "y = data[\"y\"]\n",
                "X /= np.max(X)  # normalize the data\n",
                "\n",
                "n_train = int(X.shape[0] * SPLIT)\n",
                "X_train = X[:n_train:, :]\n",
                "X_valid = X[n_train:, :]\n",
                "y_train = y[:n_train]\n",
                "y_valid = y[n_train:]\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "def rbf_kernel(X, XT, sigma):\n",
                "    ### start rbf_kernel_helper ###\n",
                "\n",
                "    ### end rbf_kernel_helper ###\n",
                "    return K\n",
                "\n",
                "plt.figure(figsize=[12,10])\n",
                "isubplot = 0\n",
                "for sigma in [10, 3, 1, 0.3, 0.1, 0.03]:\n",
                "    ### start rbf_kernel ###\n",
                "\n",
                "    ### end rbf_kernel ###\n",
                "plt.show();\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "_Your comments here..._\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## (g) Experimentation\n",
                "\n",
                "**Disable the `clip` option in the provided `heatmap`\n",
                "function and redraw the heatmap plots for the functions learned by the\n",
                "polynomial kernel and RBF kernel.  Experiment on the provided datasets and\n",
                "describe one potential problem of the polynomial kernel related to what\n",
                "you see here. Does the RBF kernel have such problem?  Compute,\n",
                "compare, comment, and attach the heatmap plots of the polynomial kernel and the\n",
                "RBF kernel on `heart.npz` dataset.**\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "data = np.load('heart.npz')\n",
                "\n",
                "SPLIT = 0.8\n",
                "X = data[\"x\"]\n",
                "y = data[\"y\"]\n",
                "X /= np.max(X)  # normalize the data\n",
                "\n",
                "n_train = int(X.shape[0] * SPLIT)\n",
                "X_train = X[:n_train:, :]\n",
                "X_valid = X[n_train:, :]\n",
                "y_train = y[:n_train]\n",
                "y_valid = y[n_train:]\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# Your code here...\n"
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.7.4"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 4
}